Finding the Maximum Likelihood Tree is Hard
نویسندگان
چکیده
Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees (Felsenstein, 1981). Finding optimal ML trees appears to be a very hard computational task, but for tractable cases, ML is the method of choice. In particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for the second major character based criterion, maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years (Graham and Foulds, 1982; Day, Johnson, and Sankoff, 1986), such a hardness result for ML has so far eluded researchers in the field. An important work by Tuffley and Steel (1997) proves quantitative relations between the parsimony values of given sequences and the corresponding log likelihood values. However, a direct application of their work would only give an exponential time reduction from MP to ML. Another step in this direction has recently been made by Addario-Berry et al. (2004), who proved that ancestral maximum likelihood (AML) is NP-complete. AML “lies in between” the two problems, having some properties of MP and some properties of ML. Still, the AML proof is not directly applicable to the ML problem. We resolve the question, showing that “regular” ML on phylogenetic trees is indeed intractable. Our reduction follows the vertex cover reductions for MP (Day et al. ) and AML (Addario-Berry et al.), but its starting point is an approximation version of vertex cover, known as gap vc. The crux of our work is not the reduction, but its correctness proof. The proof goes through a series of tree modifications, while controlling the likelihood losses at each step, using the bounds of Tuffley and Steel. The proof can be viewed as correlating the value of any ML solution to an arbitrarily close approximation to vertex cover.
منابع مشابه
Maximum Likelihood Bounded Tree-Width Markov Networks
We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded tree-width. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NP-hard to solve exactly and provide an approximation algorithm with a provable performance guarantee.
متن کاملEvaluation of estimation methods for parameters of the probability functions in tree diameter distribution modeling
One of the most commonly used statistical models for characterizing the variations of tree diameter at breast height is Weibull distribution. The usual approach for estimating parameters of a statistical model is the maximum likelihood estimation (likelihood method). Usually, this works based on iterative algorithms such as Newton-Raphson. However, the efficiency of the likelihood method is not...
متن کاملComparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds
Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...
متن کاملFinding a path is harder than finding a tree
I consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of nding the optimal tree graphical model.
متن کاملAncestral Maximum Likelihood of Evolutionary Trees Is Hard
Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task--in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such h...
متن کاملOperations Research Ph . D . Final Exam
Phylogenetics is the study of evolutionary relations between different organisms. Phylogenetic trees are the representations of these relations. Researchers have been working on finding fast and systematic approaches to reconstruct phylogenetic trees from observed data for over 40 years. It has been shown that, given a certain criterion to evaluate each tree, finding the best fitted phylogeneti...
متن کامل